Improve explain by ordering predicates#11903
Conversation
65e4ce4 to
d92de39
Compare
There was a problem hiding this comment.
Can you please add a test to com.facebook.presto.tests.TestQueryPlanDeterminism that would fail without your change here?
There was a problem hiding this comment.
@kokosing Grzegorz, this functionality requires partitioned tables. TestQueryPlanDeterminism doesn't have any.
There was a problem hiding this comment.
Isn't TPCH "partitioned"? See orders.orderstatus, part.container and part.type
There was a problem hiding this comment.
@kokosing That's right. I didn't notice. I couldn't reproduce the issue using these tables though. In general, this has been tricky to reproduce. One repro we have involves a complicated query and a table with 7 partition keys.
d92de39 to
9ffd10b
Compare
de56f0e to
cf20092
Compare
|
@bmc99 Mithun, the code changes look good. Let's improve the commit message a bit. How about this: |
cf20092 to
f1d3d25
Compare
|
I have updated the commit message. |
Partition columns is a connector-specific concept (in particular, just Hive supports it). This engine-level change applies to any column that was pushed down into the connector during query planning. |
|
@martint Martin, thanks for clarifying. What would be a good way to describe this change then? Will this work? |
There was a problem hiding this comment.
This seems like fixing the symptoms. Do you know what is the root cause of non-determinism?
There was a problem hiding this comment.
I have considered fixing the predicate order at the place they are added as one of the option. But, I did not see a need for the the predicates to be in order anywhere else except in the explain plan. So, I have decided to fix it a that layer.
Are there any requirements/need to have the predicates ordered throughout? Can you please elaborate as to why you think there is a root issue other than explain plan?
There was a problem hiding this comment.
TableScanNode#getCurrentConstraint is used in multiple places in the code. It could be that predicate order matters in those places (e.g: when converting predicate back to filter expression). There are quite sophisticated methods that process TupleDomain.
There was a problem hiding this comment.
@sopel39 Karol, it feels like exploring what other places might need fixing is beyond the scope of this change.
There was a problem hiding this comment.
Maybe it's just TupleDomain itself. For instance I found TupleDomain#simplify which uses nondeterministic HashMap collector. Another one is: TupleDomain#intersect (uses HashMap). Another one: TupleDomain#columnWiseUnion (uses HashMap). Another one: TupleDomain#transform. Instead of HashMap we could use LinkedHashMap that preserves order.
There are probably few such places to be fixed. Then explicit sorting in explain should not be needed and plan itself would be deterministic (not just explain).
There was a problem hiding this comment.
This change doesn't improve plan stability (in actual plan partition columns are still in unspecified order, right?), but explain stability
There was a problem hiding this comment.
I agree that the change stabilizes the explain; Plan stability is not an issue as it relates to unspecified order of the predicates so the verbiage can be changed to use the word explain instead of plan.
f1d3d25 to
575b709
Compare
To improve explain plan stability (e.g. make sure explain plans for the same query match) print pushed-down predicates in alphabetical order.
575b709 to
e3d6e5a
Compare
martint
left a comment
There was a problem hiding this comment.
I agree with Karol that this is just fixing a symptom. We should fix the underlying cause, instead.
|
@mbasmanova I think in Presto we rely that immutable collections preserve insertion order quite a lot as it allows for plan determinism. That's why we use Guava immutable collections: |
|
@sopel39 Karol, I didn't know that. Thanks for explaining. |
|
Superseded by #12332 |
Fixes #11444